Skip to content

Replace Command with CommandSchedule#20

Merged
jonathanvdc merged 33 commits intomainfrom
parallel-rule-iteration-fusion
Nov 9, 2025
Merged

Replace Command with CommandSchedule#20
jonathanvdc merged 33 commits intomainfrom
parallel-rule-iteration-fusion

Conversation

@jonathanvdc
Copy link
Owner

@jonathanvdc jonathanvdc commented Nov 8, 2025

This pull request replaces the Command class hierarchy with CommandSchedule in order to perform on-the-fly command batch formation, parallelizing a sequential step.

@github-actions
Copy link

github-actions bot commented Nov 8, 2025

JMH comparison (baseline vs PR)

Benchmark Params Baseline PR Δ (PR/Base) Unit
IncrementalBenchmarks.incrementalPolynomial depth=6, mutableEGraph=false, size=500, threadCount=1 210.5 194.29 0.923× (-7.7%) ms/op
IncrementalBenchmarks.incrementalPolynomial depth=6, mutableEGraph=false, size=500, threadCount=2 284.833 310.638 1.091× (+9.1%) ms/op
IncrementalBenchmarks.incrementalPolynomial depth=6, mutableEGraph=true, size=500, threadCount=1 5.07017 7.91167 1.560× (+56.0%) ms/op
IncrementalBenchmarks.incrementalPolynomial depth=6, mutableEGraph=true, size=500, threadCount=2 26.6326 76.5164 2.873× (+187.3%) ms/op
IncrementalBenchmarks.oneByOnePolynomial depth=6, mutableEGraph=false, size=500, threadCount=1 369.453 335.723 0.909× (-9.1%) ms/op
IncrementalBenchmarks.oneByOnePolynomial depth=6, mutableEGraph=false, size=500, threadCount=2 501.172 512.031 1.022× (+2.2%) ms/op
IncrementalBenchmarks.oneByOnePolynomial depth=6, mutableEGraph=true, size=500, threadCount=1 167.732 144.979 0.864× (-13.6%) ms/op
IncrementalBenchmarks.oneByOnePolynomial depth=6, mutableEGraph=true, size=500, threadCount=2 277.907 301.479 1.085× (+8.5%) ms/op
LiarBenchmarks.findGemmInMm threadCount=1 222.272 231.529 1.042× (+4.2%) ms/op
LiarBenchmarks.findGemmInMm threadCount=2 188.495 179.49 0.952× (-4.8%) ms/op
LiarBenchmarks.findGemvInMv threadCount=1 63.5259 59.8171 0.942× (-5.8%) ms/op
LiarBenchmarks.findGemvInMv threadCount=2 57.2284 55.1964 0.964× (-3.6%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=20, threadCount=1 5.75782 4.34889 0.755× (-24.5%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=20, threadCount=2 5.56953 4.35167 0.781× (-21.9%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=40, threadCount=1 38.3029 30.1172 0.786× (-21.4%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=40, threadCount=2 29.2298 23.7103 0.811× (-18.9%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=80, threadCount=1 290.41 272.661 0.939× (-6.1%) ms/op
MatmulBenchmarks.nmm mutableEGraph=false, size=80, threadCount=2 208.461 166.347 0.798× (-20.2%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=20, threadCount=1 3.49014 2.71994 0.779× (-22.1%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=20, threadCount=2 3.3044 2.43997 0.738× (-26.2%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=40, threadCount=1 24.8563 20.4156 0.821× (-17.9%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=40, threadCount=2 21.4311 15.1704 0.708× (-29.2%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=80, threadCount=1 233.612 174.646 0.748× (-25.2%) ms/op
MatmulBenchmarks.nmm mutableEGraph=true, size=80, threadCount=2 161.787 112.02 0.692× (-30.8%) ms/op
PolyBenchmarks.polynomial mutableEGraph=false, size=5, threadCount=1 97.3256 88.7649 0.912× (-8.8%) ms/op
PolyBenchmarks.polynomial mutableEGraph=false, size=5, threadCount=2 88.1037 80.3334 0.912× (-8.8%) ms/op
PolyBenchmarks.polynomial mutableEGraph=false, size=6, threadCount=1 445.165 412.7 0.927× (-7.3%) ms/op
PolyBenchmarks.polynomial mutableEGraph=false, size=6, threadCount=2 383.238 362.433 0.946× (-5.4%) ms/op
PolyBenchmarks.polynomial mutableEGraph=true, size=5, threadCount=1 41.2605 34.4317 0.834× (-16.6%) ms/op
PolyBenchmarks.polynomial mutableEGraph=true, size=5, threadCount=2 34.145 28.8319 0.844× (-15.6%) ms/op
PolyBenchmarks.polynomial mutableEGraph=true, size=6, threadCount=1 160.312 143.57 0.896× (-10.4%) ms/op
PolyBenchmarks.polynomial mutableEGraph=true, size=6, threadCount=2 134.745 115.145 0.855× (-14.5%) ms/op
VectorBenchmarks.blinnPhong mutableEGraph=false, threadCount=1 36129.6 34652.7 0.959× (-4.1%) ms/op
VectorBenchmarks.blinnPhong mutableEGraph=false, threadCount=2 34545 33643.4 0.974× (-2.6%) ms/op
VectorBenchmarks.blinnPhong mutableEGraph=true, threadCount=1 5339.89 4640.81 0.869× (-13.1%) ms/op
VectorBenchmarks.blinnPhong mutableEGraph=true, threadCount=2 4296.27 3364.05 0.783× (-21.7%) ms/op
VectorBenchmarks.gramSchmidt mutableEGraph=false, threadCount=1 542.232 464.929 0.857× (-14.3%) ms/op
VectorBenchmarks.gramSchmidt mutableEGraph=false, threadCount=2 445.549 391.76 0.879× (-12.1%) ms/op
VectorBenchmarks.gramSchmidt mutableEGraph=true, threadCount=1 208.404 168.738 0.810× (-19.0%) ms/op
VectorBenchmarks.gramSchmidt mutableEGraph=true, threadCount=2 167.778 138.516 0.826× (-17.4%) ms/op
VectorBenchmarks.reflection mutableEGraph=false, threadCount=1 321.673 306.639 0.953× (-4.7%) ms/op
VectorBenchmarks.reflection mutableEGraph=false, threadCount=2 266.93 243.992 0.914× (-8.6%) ms/op
VectorBenchmarks.reflection mutableEGraph=true, threadCount=1 136.91 123.608 0.903× (-9.7%) ms/op
VectorBenchmarks.reflection mutableEGraph=true, threadCount=2 104.531 84.4015 0.807× (-19.3%) ms/op
VectorBenchmarks.vectorNormalization mutableEGraph=false, threadCount=1 4.24104 3.83662 0.905× (-9.5%) ms/op
VectorBenchmarks.vectorNormalization mutableEGraph=false, threadCount=2 4.7103 4.34356 0.922× (-7.8%) ms/op
VectorBenchmarks.vectorNormalization mutableEGraph=true, threadCount=1 1.95671 1.64295 0.840× (-16.0%) ms/op
VectorBenchmarks.vectorNormalization mutableEGraph=true, threadCount=2 2.53502 2.30435 0.909× (-9.1%) ms/op
Geomean threadCount=1, mutableEGraph=false 0.905× (-9.5%)
Geomean threadCount=1, mutableEGraph=true 0.884× (-11.6%)
Geomean threadCount=2, mutableEGraph=false 0.917× (-8.3%)
Geomean threadCount=2, mutableEGraph=true 0.917× (-8.3%)

Note: < 1.0× means faster on PR; > 1.0× means slower.

@jonathanvdc jonathanvdc changed the title Fuse parallel rule iterations Replace Command with CommandSchedule Nov 9, 2025
@jonathanvdc jonathanvdc merged commit 2b9955e into main Nov 9, 2025
2 checks passed
@jonathanvdc jonathanvdc deleted the parallel-rule-iteration-fusion branch November 9, 2025 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant